An Overview of Resources and Basic Tools for the Processing of Serbian Written Texts

نویسندگان

  • Duško Vitas
  • Cvetana Krstev
  • Ivan Obradović
  • Ljubomir Popović
  • Gordana Pavlović-Lažetić
چکیده

In this paper we describe the resources and tools for the processing of texts written in Serbian. Most of the resources have been developed within the University of Belgrade NLP group located at the Faculty of Mathematics. The main features of these resources, namely available monolingual and multilingual corpora and various e-dictionaries are briefly described. The use of Intex, the main tool of the NLP group, for the recognition of unknown words, text tagging, building local grammars and disambiguation is outlined.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Processing Serbian Written Texts: An Overview of Resources and Basic Tools

In this paper we describe the resources and tools for the processing of texts written in Serbian that have been developed within the University of Belgrade NLP group located at the Faculty of Mathematics. The main features of these resources, namely available monolingual and multilingual corpora and various e-dictionaries are briefly described. The use of Intex, the main tool of the NLP group, ...

متن کامل

Metadiscourse Markers Revisited in EFL Context: The Case of Iranian Academic Learners’ Perception of Written Texts

Moving in line with the postulation that metadiscourse (MD) markers help transform a dry and tortuous piece of text into a coherent and reader-friendly one, the researchers in the current study attempted to investigate the effect different metadiscourse markers might have on Iranian EFL learners’ perception of written texts. To this end, 120 undergraduate English students were given three diffe...

متن کامل

The Extent of Using the Basic Vocabulary in the First Grade Quran Textbook

The Extent of Using the Basic Vocabulary in the First Grade Quran Textbook S. B. Alavi Moghaddam, Ph.D. Textbooks need to be written in such a way that their readers can understand the written texts. One way of ensuring this objective in first grade textbooks would be the use of basic vocabulary, as determined by Ne'matzadeh, et.al. (1384). Considering the importance of the Quran text...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Automatic Recognition of Composite Verb Forms in Serbian

In this paper, we will present the work on building a shallow parser for recognizing composite verb forms in Serbian – the forms that consist of an auxiliary verb and a main verb. The parser is made in Unitex, a corpus processing software, in the form of local grammars that rely on using morphological dictionaries of Serbian. The model was tested on a small corpus of texts, both written in Serb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003